Use the
selectfunction to select variables (columns) from a tibble.
Given a tibble select can be used to :
Let’s take the pulse dataset:
pulse
# A tibble: 110 x 13
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993
6 1993_F George 184 74 22 male no yes low ran 78 141 1993
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993
# … with 100 more rows
select takes as it first argument a tibble, followed by a comma separated list of variables of your choice and returns a tibble with those chosen variables:
select(pulse, name, age)
# A tibble: 110 x 2
name age
<chr> <dbl>
1 Bonnie 18
2 Melanie 19
3 Consuelo 18
4 Travis 18
5 Lauri 18
6 George 22
7 Cherry 20
8 Francesca 18
9 Sonja 19
10 Troy 23
# … with 100 more rows
AnswerAfter this selection, does
pulsetibble still contain the variables ‘name’ and ‘age’?
If you want to keep your selection as a separate tibble you’ll need to assign the result into a new environment variable, e.g. pulse_name_age_only:
pulse_name_age_only <- select(pulse, name, age)
pulse_name_age_only
# A tibble: 110 x 2
name age
<chr> <dbl>
1 Bonnie 18
2 Melanie 19
3 Consuelo 18
4 Travis 18
5 Lauri 18
6 George 22
7 Cherry 20
8 Francesca 18
9 Sonja 19
10 Troy 23
# … with 100 more rows
The order of the selected variables is reflected in the resulting tibble:
select(pulse, age, name )
# A tibble: 110 x 2
age name
<dbl> <chr>
1 18 Bonnie
2 19 Melanie
3 18 Consuelo
4 18 Travis
5 18 Lauri
6 22 George
7 20 Cherry
8 18 Francesca
9 19 Sonja
10 23 Troy
# … with 100 more rows
You may also deselect variables, with other words the complement of your selection. This is done by the - sign:
select(pulse, -smokes, -alcohol)
# A tibble: 110 x 11
id name height weight age gender exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female moderate sat 86 88 1993
2 1993_B Melanie 179 58 19 female moderate ran 82 150 1993
3 1993_C Consuelo 167 62 18 female high ran 96 176 1993
4 1993_D Travis 195 84 18 male high sat 71 73 1993
5 1993_E Lauri 173 64 18 female low sat 90 88 1993
6 1993_F George 184 74 22 male low ran 78 141 1993
7 1993_G Cherry 162 57 20 female moderate sat 68 72 1993
8 1993_H Francesca 169 55 18 female moderate sat 71 77 1993
9 1993_I Sonja 164 56 19 female high sat 68 68 1993
10 1993_J Troy 168 60 23 male moderate ran 88 150 1993
# … with 100 more rows
With selection it is possible to change the variable names simultaneously:
select(pulse, FirstName = name, Age = age)
# A tibble: 110 x 2
FirstName Age
<chr> <dbl>
1 Bonnie 18
2 Melanie 19
3 Consuelo 18
4 Travis 18
5 Lauri 18
6 George 22
7 Cherry 20
8 Francesca 18
9 Sonja 19
10 Troy 23
# … with 100 more rows
AnswerWhat is the variable name in the pulse dataset, ‘Age’ or ‘age’?
With select we can reshuffle the variables. Sometimes tibbles may have large number of variables, you can bring the more ‘important’ variables in front with select and a convenience function evertything:
select(pulse, name, age, everything())
# A tibble: 110 x 13
name age id height weight gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Bonnie 18 1993_A 173 57 female no yes moderate sat 86 88 1993
2 Melanie 19 1993_B 179 58 female no yes moderate ran 82 150 1993
3 Consuelo 18 1993_C 167 62 female no yes high ran 96 176 1993
4 Travis 18 1993_D 195 84 male no yes high sat 71 73 1993
5 Lauri 18 1993_E 173 64 female no yes low sat 90 88 1993
6 George 22 1993_F 184 74 male no yes low ran 78 141 1993
7 Cherry 20 1993_G 162 57 female no yes moderate sat 68 72 1993
8 Francesca 18 1993_H 169 55 female no yes moderate sat 71 77 1993
9 Sonja 19 1993_I 164 56 female no yes high sat 68 68 1993
10 Troy 23 1993_J 168 60 male no yes moderate ran 88 150 1993
# … with 100 more rows
everything function lists all other variable other than name and age and select function places them after name and age.
Copyright © 2021 Biomedical Data Sciences (BDS) | LUMC